library(tidyverse)
library(patchwork)
library(lubridate)
library(tsibble)
library(feasts)
library(forecast)
library(sandwich)
library(lmtest)
library(nycflights13)
library(blsR)
theme_set(theme_minimal())
To start with this homework, you will be using the same data that
Jeffrey uses in the lecture – US flights data. The data comes from the
packages nycflights13.
Our goal with the tasks in this question are to try to familiarize yourself with some of the key programming concepts related to time series data – setting time indexes and key variables, grouping and indexing on those variables, and producing descriptive plots of data that is stored in a time series form.
In the package declarations, we have loaded the
nycflights13 package. This provides three objects that we
are going to use:
flights;airports; and,weather.You can investigate these objects more by issuing a ?
before them, to access their documentation.
As stored, both flights and weather are
stored a “plain” data frames. To begin, cast the flights
dataset into a time series dataset, a tsibble.
time_index. There is very good handling of dates inside of
the lubridate package. There is a nice one-page cheetsheet
that Rstudio makes available. For this task you might be looking for
lubridate::make_datetime.key. We need to define a key because in some cases there
are more than one flight that leave at the same time – this is because
the granularity of our time measure is at the minute and it is possible
for two planes to leave within the same minute.Data is created below. I checked for missing values.
There are more than one way to create the time index, I experimented with a couple and settled on the method below.
#flights
#?flights
# Checking missing value
if (nrow(flights) == sum(!is.na(flights$time_hour) & !is.na(flights$minute))) {
print("No missing values for departure time y-m-d-h-m")
}
## [1] "No missing values for departure time y-m-d-h-m"
flights$time_index <-
flights$time_hour + lubridate::minutes(flights$minute)
flights_ts <-
tsibble::as_tsibble(
flights,
key = c(carrier, flight),
index = time_index
)
flights_ts
## # A tsibble: 336,776 x 20 [1m] <America/New_York>
## # Key: carrier, flight [5,725]
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 11 3 1531 1540 -9 1653 1725
## 2 2013 11 4 1539 1540 -1 1712 1725
## 3 2013 11 5 1548 1540 8 1708 1725
## 4 2013 11 6 1535 1540 -5 1657 1725
## 5 2013 11 7 1549 1540 9 1733 1725
## 6 2013 11 8 1539 1540 -1 1706 1725
## 7 2013 11 9 1535 1540 -5 1714 1723
## 8 2013 11 10 1538 1540 -2 1718 1725
## 9 2013 11 11 1527 1540 -13 1710 1725
## 10 2013 11 12 1535 1540 -5 1709 1725
## # ℹ 336,766 more rows
## # ℹ 12 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>, time_index <dttm>
Using ggplot, create a plot of the number of flights per
month. What, if anything, do you note about the total volume of flights
throughout the year? (Don’t worry if the plot doesn’t tell something
interesting about the data. This data is pretty… boring.)
The plot shows a relatively stable volume of flights throughout the year. Total flight volume is lowest in the winter months (January and February) and then slightly increases and remains high throughout the summer and fall, demonstrating a minor degree of annual seasonality common in the travel industry.
flights_ts %>%
index_by(month_index = yearmonth(time_index)) %>%
summarize(monthly_sum = n()) %>%
ggplot(aes(x = month_index,
y = monthly_sum)) +
geom_line() +
labs(title = "Monthly number of flights departing NYC",
x = "Month",
y = "Number of Flights")
Is there a difference in flights to tropical destinations throughout the year? Use the following concept of a tropical destination:
A tropical destination is one who is “in the tropics” – that is, they are located between the Tropic of Cancer and the Tropic of Capricorn.
airports dataset, create a new variable,
is_tropical that notes whether the destination airport is
in a tropical latitude.group_by call that groups
on month and is_tropical. Why does this not
work? What is happening when grouping by month while also
having a time index?tsibble::index_by
and combine this with a lubridate “extractor” to pull the
time object that you want out of the time_index variable
that you created.group_by(is_tropical), and
index_by the month that you extract from your
time_index. (This is a bit of a strange part of the
geom_* API, but this might be a useful place to use the
geom_step geometry to highlight changes in this
series.)Using group_by(month, is_tropical) with the original flights data doesn’t work because tsibble requires unique time index column (time_index) for its output. When we group by a lower-granularity variable (month) while the time index is of higher granularity (minute), there is no correct single time index for the resulting summarized rows
Comment on the plots There are airports in the flights data but not in airports. Looking into the setdiff leads to 4 airports that are all lcoated in the tropics. Hence I assigned all these flights to destination tropics.
I created 2 versions because tropical flights are small compared to others. So the second plot gives a better visual in terms of seasonality.
Overall, the volume of flights to non-tropical destinations is overwhelming and shows high stability throughout the year, with some seasonal variation.
Flights to tropical destinations clearly exhibit annual seasonality, peaking during the colder months (December/January) when people are seeking warm weather, and also showing a slight increase during the summer months.
#?airports
#airports
airports_augmented <- airports %>%
mutate(is_tropical = ifelse(
lat >= -23.5 & lat <= 23.5,
"tropical",
"non-tropical"
))
#unique(airports_augmented$is_tropical)
flights_airports_ts <-
flights_ts %>%
left_join(airports_augmented, by = c("dest" = "faa")) %>%
# Handle the NA values created by missing airport codes (SJU, STT, BQN, PSE).
# Since these known missing airports are tropical, we replace NA with "tropical".
mutate(
is_tropical = replace_na(is_tropical, "tropical")
) %>%
# Convert the final, cleaned column to a factor
mutate(is_tropical = factor(is_tropical))
#flights %>%
# group_by(month, is_tropical) %>%
# summarise(total_flights = n())
flights_airports_ts %>%
group_by(is_tropical) %>%
index_by(month_index = yearmonth(time_index)) %>%
summarize(total_flights = n()) %>%
ggplot(aes(x = month_index,
y = total_flights,
colors = is_tropical)) +
geom_step(linewidth = 1) +
labs(title = "Monthly Flight Volume to Tropical vs. Non-Tropical Destinations",
x = "Month",
y = "Total Flights",
color = "Destination Type"
)
flights_airports_ts %>%
group_by(is_tropical) %>%
index_by(month_index = yearmonth(time_index)) %>%
summarise(
total_flights = n()) %>%
# Because one line is too low,
# I use another to gain insight
ggplot(aes(x = month_index, y = total_flights)) +
geom_step(linewidth = 1) +
# Add facetting to separate the two series vertically
facet_grid(is_tropical ~ ., scales = "free_y") +
labs(
title = "Monthly Flight Volume by Destination Type",
x = "Month",
y = "Total Flights",
color = "Destination Type" # This will no longer be used
)
summary(flights_airports_ts$is_tropical)
## non-tropical tropical
## 328467 8309
## Temp - to be removed
# 1. Get all unique destination codes from flights
#flight_dests <- unique(flights_ts$dest)
# 2. Get all unique airport codes from the augmented airports data
#airport_codes <- unique(airports_augmented$faa)
# 3. Find which flight destinations are NOT in the airport codes list
#missing_codes <- setdiff(flight_dests, airport_codes)
# Output the missing codes (there are usually a few)
#print(missing_codes)
Our goal in this question is to ask you to re-apply what you know about producing time series objects to very similarly structurd data.
Turn your attention to the weather data that is provided in the
nycflights13::weather dataset. Produce a
tsibble that uses time as a time index, and
origin as a key for this data. You will notice that there
are three origins, “EWR”, “JFK” and “LGA”.
(Hint: We anticipate that you are going to see the following error on the first time that you try to convert this data frame:
Error in `validate_tsibble()`:
A valid tsibble must have distinct rows identified by key and index.
Please use `duplicates()` to check the duplicated rows.
Run `rlang::last_error()` to see where the error occurred.
This is a very helpful error, with a helpful error message.
If you see this error message, we suggest doing as the message suggests,
and look into the duplicates() function to determine what
the issue is. Once you have found the issue, (1) document the issue; (2)
propose a solution that seems reasonable; and, (3) implement your
proposed solution and keep it moving to answer this question.
Anticipated Issue:
The base nycflights13::weather data is said to contain duplicates. But I didn’t run into the error message, nor did the code
weather %>% duplicates(key = origin, index = time_hour)
produced non-zero lines. So the data looks okay.
# weather %>%
# mutate(time_index = make_datetime(year, month, day, hour)) %>%
# duplicates(key = origin, index = time_index) %>%
# glimpse()
#weather
#?weather
# Okay if producing 0 row
weather %>%
duplicates(key = origin, index = time_hour)
## # A tibble: 0 × 15
## # ℹ 15 variables: origin <chr>, year <int>, month <int>, day <int>, hour <int>,
## # temp <dbl>, dewp <dbl>, humid <dbl>, wind_dir <dbl>, wind_speed <dbl>,
## # wind_gust <dbl>, precip <dbl>, pressure <dbl>, visib <dbl>,
## # time_hour <dttm>
weather_ts <-
as_tsibble(
weather,
key = origin,
index = time_hour
)
weather_ts
## # A tsibble: 26,115 x 15 [1h] <America/New_York>
## # Key: origin [3]
## origin year month day hour temp dewp humid wind_dir wind_speed
## <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 EWR 2013 1 1 1 39.0 26.1 59.4 270 10.4
## 2 EWR 2013 1 1 2 39.0 27.0 61.6 250 8.06
## 3 EWR 2013 1 1 3 39.0 28.0 64.4 240 11.5
## 4 EWR 2013 1 1 4 39.9 28.0 62.2 250 12.7
## 5 EWR 2013 1 1 5 39.0 28.0 64.4 260 12.7
## 6 EWR 2013 1 1 6 37.9 28.0 67.2 240 11.5
## 7 EWR 2013 1 1 7 39.0 28.0 64.4 240 15.0
## 8 EWR 2013 1 1 8 39.9 28.0 62.2 250 10.4
## 9 EWR 2013 1 1 9 39.9 28.0 62.2 260 15.0
## 10 EWR 2013 1 1 10 41 28.0 59.6 260 13.8
## # ℹ 26,105 more rows
## # ℹ 5 more variables: wind_gust <dbl>, precip <dbl>, pressure <dbl>,
## # visib <dbl>, time_hour <dttm>
weather_ts <-
as_tsibble(
weather,
key = origin,
index = time_hour
)
weather_ts
## # A tsibble: 26,115 x 15 [1h] <America/New_York>
## # Key: origin [3]
## origin year month day hour temp dewp humid wind_dir wind_speed
## <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 EWR 2013 1 1 1 39.0 26.1 59.4 270 10.4
## 2 EWR 2013 1 1 2 39.0 27.0 61.6 250 8.06
## 3 EWR 2013 1 1 3 39.0 28.0 64.4 240 11.5
## 4 EWR 2013 1 1 4 39.9 28.0 62.2 250 12.7
## 5 EWR 2013 1 1 5 39.0 28.0 64.4 260 12.7
## 6 EWR 2013 1 1 6 37.9 28.0 67.2 240 11.5
## 7 EWR 2013 1 1 7 39.0 28.0 64.4 240 15.0
## 8 EWR 2013 1 1 8 39.9 28.0 62.2 250 10.4
## 9 EWR 2013 1 1 9 39.9 28.0 62.2 260 15.0
## 10 EWR 2013 1 1 10 41 28.0 59.6 260 13.8
## # ℹ 26,105 more rows
## # ℹ 5 more variables: wind_gust <dbl>, precip <dbl>, pressure <dbl>,
## # visib <dbl>, time_hour <dttm>
With this weather data, produce the following figure of the temperature every hour, for each of the origins.
This figure contains five separate plots:
You might think of these plots as “zooming in” on the time series to show more detail.
In your workflow, first create each of the plots. Then, use the
patchwork package to compose each of these plots into a
single figure.
After you produce this figure, comment on what you notice at each of these scales and the figure overall.
Commentary
Entire Year: The entire series is dominated by strong annual seasonality, showing a large, smooth sinusoidal pattern. All three airport temperatures track closely, indicating small temperature differences across the NYC metropolitan area.
Monthly (January and July): At this scale, the daily cycle becomes apparent. January shows low but fluctuating temperatures. July shows consistently warm temperatures, clearly illustrating the seasonal contrast.
Weekly/Daily: The daily cycle is dominant. Temperature rises during the day and falls at night in a smooth, predictable manner. High-frequency noise and short-term volatility are visible at this finest scale.
yearly_plot <-
weather_ts %>%
ggplot(aes(
x = time_hour,
y = temp,
color = origin
)) +
geom_line(linewidth = 0.25) +
labs(
title = "Full Year Weather Data",
x = "Time",
y = "Tempreture Farenheit"
)
yearly_plot
january_plot <-
weather_ts %>%
filter(month(time_hour) == 1) %>%
ggplot(aes(
x = time_hour,
y = temp,
color = origin
)) +
geom_line(linewidth = 0.2) +
labs(
title = "2013 January Tempreture",
x = "Time - Hour",
y = "Tempreture in Farenheit"
)
january_plot
july_plot <-
weather_ts %>%
filter(month(time_hour) == 7) %>%
ggplot(aes(
x = time_hour,
y = temp,
color = origin
)) +
geom_line(linewidth = 0.2) +
labs(
title = "2013 July Tempreture",
x = "Time - Hour",
y = "Tempreture in Farenheit"
)
july_plot
january_first_week <-
weather_ts %>%
filter((month(time_hour) == 1) & (day(time_hour) <= 7)) %>%
ggplot(aes(
x = time_hour,
y = temp,
color = origin
)) +
geom_line(linewidth = 0.5) +
labs(
title = "2013 January First Week Tempreture",
x = "Time - Hour",
y = "Tempreture in Farenheit"
)
january_first_week
july_first_week <-
weather_ts %>%
filter((month(time_hour) == 7) & (day(time_hour) <= 7)) %>%
ggplot(aes(
x = time_hour,
y = temp,
color = origin
)) +
geom_line(linewidth = 0.5) +
labs(
title = "2013 July First Week Tempreture",
x = "Time - Hour",
y = "Tempreture in Farenheit"
)
july_first_week
patch_fig <-
yearly_plot /
(january_plot | july_plot) /
(january_first_week | july_first_week) +
plot_annotation(
title = 'Temperature at New York City Airports',
subtitle = 'Many Different Views',
tag_levels = 'A') &
labs(x = NULL, y = 'Temperature')
patch_fig
At the hourly level, produce an ACF and a lag plot at JFK. What do
you learn from these plots? (Note that you can suppress all the coloring
in the gg_lag call if you pass an additional argument,
color = 1.)
Commentary
ACF Plot: The ACF coefficients show a slow decay with a clear wave-like pattern peaking every 24 lags. This indicates the series is highly persistent ( hence non-stationary) and possesses a strong daily seasonality.
Lag Plot: The points form a tight, upward-sloping cluster. This confirms visually the high positive correlation—the temperature today at a given hour is a reliable predictor of the temperature 24 hours ago.
hourly_acf <-
weather_ts %>%
filter(origin == "JFK") %>%
fill_gaps() %>%
ACF(temp, lag_max = 48) %>%
autoplot() +
labs(
title = "JFK Hourly Tempreture ACF"
)
hourly_acf
hourly_lag <-
weather_ts %>%
filter(origin == "JFK") %>%
gg_lag(temp, .log = 9, color = 1) +
labs(title = "JFK Hourly Tempreture Lag Plot")
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in lag_geom(..., arrow = arrow): Ignoring unknown parameters: `.log`
hourly_lag
# Combine and display the plots
hourly_acf | hourly_lag
#hourly_acf | hourly_lag
At the weekly level, produce an ACF and a lag plot of the weekly average temperature at JFK. What do you learn from these plots?
Comment The aggregation to weekly average removes the daily cycle seen in the hourly data. The ACF still decays slowly, indicating the series being non-stationary, but the most important feature is the large, periodic peak around Lag 52 (one year). This clearly highlights the annual seasonality of the average temperature.
weekly_acf <-
weather_ts %>%
# fill_gaps() %>%
filter(origin == "JFK") %>%
index_by(week_index = yearweek(time_hour)) %>%
summarize(
avg_temp = mean(temp, na.rm = TRUE),
.groups = "drop"
) %>%
ACF(avg_temp, lag_max = 52) %>%
autoplot() +
labs(title = "JFK Weekly Average Temperature ACF")
weekly_acf
weekly_lag <-
weather_ts %>%
# fill_gaps() %>%
filter(origin == "JFK") %>%
index_by(week_index = yearweek(time_hour)) %>%
summarize(
avg_temp = mean(temp, na.rm = TRUE),
.groups = "drop"
) %>%
gg_lag(avg_temp, .lag = 9, color = 1) +
labs(title = "JFK Weekly Average Temperature Lag Plot")
## Warning in lag_geom(..., arrow = arrow): Ignoring unknown parameters: `.lag`
weekly_lag
## make the plot
At the monthly level, produce an ACF plot of the monthly average temperature at JFK. What do you learn from these plots?
Comment: The monthly ACF shows decay and an annual seasonality (even with one year worth of data). This again points to non-stationarity.
monthly_acf <-
weather_ts %>%
# fill_gaps() %>%
filter(origin == "JFK") %>%
index_by(month_index = yearmonth(time_hour)) %>%
summarize(
avg_temp = mean(temp, rm.na = TRUE),
.group = "drop") %>%
ACF(avg_temp) %>%
autoplot() +
labs(
title = "JFK 2013 Monthly Avg. Temp. ACF"
)
monthly_acf
monthly_lag <-
weather_ts %>%
filter(origin == "JFK") %>%
index_by(month_index = yearmonth(time_hour)) %>%
summarize(
avg_temp = mean(temp, na.rm = TRUE),
.group = "drop"
) %>%
gg_lag(avg_temp, .lag = 9, color = 1) +
labs(title = "JFK 2013 Monthly ")
## Warning in lag_geom(..., arrow = arrow): Ignoring unknown parameters: `.lag`
monthly_lag
## make the plot
This is the last exercise for this assignment. Here, we’re going to do the same work that you have done twice before, but against “live” data that comes from the United States’ Bureau of Labor Statistics.
Recall that in the lecture, Jeffrey identifies the unemployment rate as an example of a time series. You can get to this data from the public web-site. To do so, head here:
Unemployment Rate (Seasonally Adjusted) and
Retreive Data. Take note when you check the
Unemployment Rate (Seasonally Adjusted), what is the series
number that is associated with this?What do you see when you get to the next page? A rectangular data series that has months on the columns, years on the rows, and values as the internals to the cells? :facepalm:
This motivates the idea of using the BLS’ data API. The data API provides consistently formatted JSON objects that can be converted to data of an arbitrary (that is, useful to us) formatting. Because the data is being provided in a JSON object, there is some work to coerce it to be useful, but we’ll find that there are so many people who are doing this same coercion that there are ready-made wrappers that will help us to do this work.
As an example, you can view how these JSON objects are formatted by navigating to an API endpoint in your browser. Here is the endpoint for the national unemployment: [link].
Let’s pull unemployment from the BLS data API.
For this assignment, consider the following three series:
LNS14000000LNS14000001LNS14000002Our goal is to analyze these three series for the last 20 years.
To articulate the BLS API, we have found the blsR
library to be the most effective (at the time that we wrote the
assignment in 2022). Here are links to get you read into the package.
Rather than providing you with a full walk-through for how to
use this package to manipulate the BLS data API, instead a learning goal
is for you to read these documents and come to an understanding of how
the package works.
Your task is to create an object called unemployment
that is a tsibble class, that contains the overall
unemployment rate, as well as the unemployment rate for male and female
people.
Your target dataframe should have the following shape but extend to the current time period.
year month time_index name value
<int> <int> <mth> <chr> <dbl>
1 2000 1 2000 Jan overall 4
2 2000 1 2000 Jan male 3.9
3 2000 1 2000 Jan female 4.1
4 2000 2 2000 Feb overall 4.1
5 2000 2 2000 Feb male 4.1
6 2000 2 2000 Feb female 4.1
7 2000 3 2000 Mar overall 4
8 2000 3 2000 Mar male 3.8
9 2000 3 2000 Mar female 4.3
10 2000 4 2000 Apr overall 3.8
I defined current_year programatically and saved the qurery of the 3 series in one df.
Manually downloaded data with year and month as rows and three series as columns, i.e., in a wide format. Tidy data requires a long format, where each observation (a monthly unemployment reading) is a single row, and each variable (year, month, series_id, value) is a separate column.
So I pivot the 3 seires (col) names into one col with 3 values/factors.
Doing analysis on manually downloaded data requires a manual update (download, cleaning, and integration) every time the BLS releases new data. This process is non-reproducible, not scalable, and error-prone, making the analytic pipeline difficult to maintain. Using the blsR API package makes the process automated and programmatic.
Code is produced with help from AI but I checked every line of code and made many edits.
Sys.setenv(BLS_KEY = "https://doi.org/10.4324/9781315157375")
series_id_list <- c(
overall = "LNS14000000",
male = "LNS14000001",
female = "LNS14000002"
)
current_year <- as.numeric(format(Sys.Date(), "%Y"))
current_year
## [1] 2025
# Query the BLS API for the last 20 years (e.g., 2004 to current year)
# Note: The output is a complex list structure
raw_bls_data <- blsR::get_n_series_table(
series_id_list,
# Request 20 years of data
start_year = current_year - 20,
end_year = current_year
)
## Year 2005 to 2025 is longer than 10 year API limit. Performing 3 requests.
## Warning: api_key is required for multiple series requests.
## api_key is required for multiple series requests.
## api_key is required for multiple series requests.
dim(raw_bls_data)
## [1] 248 5
class(raw_bls_data)
## [1] "tbl_df" "tbl" "data.frame"
unemployment <- raw_bls_data %>%
# 1. Pivot the data from wide to long format
pivot_longer(
cols = starts_with("LNS"), # Select all columns starting with the Series ID prefix
names_to = "series_id",
values_to = "value"
) %>%
# 2. Convert BLS period (e.g., "M01" -> "1") into a proper date object
mutate(
# Create the date by combining year and the period (month number)
date = lubridate::ymd(paste0(year, "-", gsub("M", "", period), "-01")),
# Clean up the series IDs into readable names
name = case_when(
series_id == "LNS14000000" ~ "overall",
series_id == "LNS14000001" ~ "male",
series_id == "LNS14000002" ~ "female",
TRUE ~ series_id
)
) %>%
# 3. Finalize columns for tsibble casting
mutate(
year = lubridate::year(date),
month = lubridate::month(date),
time_index = yearmonth(date)
) %>%
# 4. Cast to tsibble
tsibble::as_tsibble(
key = name,
index = time_index
) %>%
# 5. Select the required final columns and ensure value is numeric
select(year, month, time_index, name, value) %>%
mutate(value = as.numeric(value))
head(unemployment)
## # A tsibble: 6 x 5 [1M]
## # Key: name [1]
## year month time_index name value
## <dbl> <dbl> <mth> <chr> <dbl>
## 1 2005 1 2005 Jan female 5.1
## 2 2005 2 2005 Feb female 5.3
## 3 2005 3 2005 Mar female 5.1
## 4 2005 4 2005 Apr female 5.2
## 5 2005 5 2005 May female 5.2
## 6 2005 6 2005 Jun female 5.1
Once you have queried the data and have it successfully stored in an appropriate object, produce a plot that shows the unemployment rate on the y-axis, time on the x-axis, and each of the groups (overall, male, and female) as a different colored line.
Comment: ACF Plot: The ACF coefficients start at 1 and show a slow decay. This indicates non-stationarity and persistence: unemployment levels change gradually. There is no significant peak at Lag 12 (or multiples of 12), which may be because the data is seasonally adjusted.
Lag Plot: The points form a mostly tight, upward-sloping cluster. This visually confirms an high positive correlation between this month’s unemployment rate and next month’s rate, showing that the series is highly predictable from its immediate past. The few long lines indicates the rare events where employment jumped due to exogenous shocks (Covid, subprime financial crisis).
unemployment %>%
ggplot(aes(x = time_index, y = value, color = name)) +
geom_line(linewidth = 1) +
labs(title = "Unemployment by gender: 2005 - 2025",
x = "time")
This should feel familiar by now: Produce the ACF and lag plot of the
overall unemployment series. What do you observe?
unemployment %>%
filter(name == "overall") %>%
ACF(value, lag_max = 240) %>%
autoplot() +
labs(title = "Overall Unemployment Rate ACF")
unemployment %>%
filter(name == "overall") %>%
gg_lag(value, .lag = 9, color = 1
) +
labs(title = "Overall Unemployment Rate Lag Plot")
## Warning in lag_geom(..., arrow = arrow): Ignoring unknown parameters: `.lag`